New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support soft target in softmax_cross_entropy #5595
Conversation
Or, would it be better to implement this as different function such as softmax_kl_divergence? |
I've fixed the PR, so that it uses argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design and implementation looks good. I added some minor comments.
t_type.ndim == x_type.ndim - 1, | ||
if x_type.ndim == t_type.ndim and x_type.shape == t_type.shape: | ||
# assume t is soft_target | ||
self.soft_target = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep check_type_forward
not having side effect. This method may be skipped by setting CHAINER_TYPE_CHECK=0
.
x_type.dtype.kind == 'f', | ||
t_type.dtype.kind == 'i', | ||
t_type.ndim == x_type.ndim - 1, | ||
if x_type.ndim == t_type.ndim and x_type.shape == t_type.shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's better to branch based on dtype kind and then check ndim/shape with expect
. It will produce an error message that matches the user's intent.
def _soft_target_loss(self, xp, x, t, log_y): | ||
kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1) | ||
if self.reduce == 'mean': | ||
self._coeff = 1.0 / (numpy.prod(x.shape) / x.shape[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._coeff = 1.0 / (numpy.prod(x.shape) / x.shape[1]) | |
self._coeff = 1.0 / (x.size / x.shape[1]) |
size
can be used to get the total number of elements.
return kl_d.reshape(()), | ||
else: | ||
shape = (x.shape[0],) + x.shape[2:] | ||
return kl_d.reshape(shape), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this reshape needed?
self.check_backward_options = {} | ||
|
||
def check_forward(self, xp): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass | |
raise NotImplementedError |
to ensure this method is overridden.
t_hard_shape = (self.nb,) + self.shape[1:] | ||
self.t_hard = numpy.random.randint( | ||
0, self.shape[0], t_hard_shape).astype(numpy.int32) | ||
t = numpy.zeros(numpy.prod(self.x.shape)).astype(self.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t = numpy.zeros(numpy.prod(self.x.shape)).astype(self.dtype) | |
t = numpy.zeros(self.x.size).astype(self.dtype) |
Thanks for your comments. I've just fixed the branch based on your feedback. Please review it again. |
Thank you for the updates. Looks good to me. Could you add a description of the soft target support to the docstring? |
I've just added a description of the soft target support. Please check it again. |
Could you resolve merge conflicts? |
Sorry for the late reply. I've just resolved the conflict to master branch. Could you check it again? |
tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py
Show resolved
Hide resolved
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs. Thank you for your contributions. |
Bump. Sorry for the late response. Could you resolve the conflict again? |
I've just resolved the conflicts with the master branch. Could you check it? |
Sorry for the late reply. The code looks good to me, but I have a bit concern on the naming now. The function named "cross entropy" that computes KL divergence is confusing and it will quite likely be misused. You asked above if it is better to name it |
@@ -223,21 +244,32 @@ def forward_gpu(self, inputs): | |||
ret = ret.reshape(t.shape) | |||
return ret, | |||
|
|||
def _soft_target_loss(self, xp, x, t, log_y): | |||
kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To compute the cross entropy,
kl_d = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1) | |
____ = -xp.sum(t * log_y, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, I'm wondering if this is correct, since it fails at the test below.
chainer/tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py
Line 618 in 1ab2632
class TestSoftTargetExpectNearZero(BaseSoftTarget, unittest.TestCase): |
This test uses output of softmax as soft target label, so output of softmax_cross_entropy is expected to be zero or almost zero. But softmax_cross_entropy returns non-zero value when 'cross-entropy' is used as soft target loss calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1ab2632.
BTW, |
All right, I added an option 'soft_target_loss' so that you can opt which loss calculation method to use for soft target loss: 'cross-entropy' or 'kl-divergence'. What would you think about this option? |
Thanks for the fix! |
Jenkins CI test (for commit 1ab2632, target branch master) failed with status FAILURE. |
if self.soft_target_loss == 'kl-divergence': | ||
ret = xp.sum(t * (xp.log(t + self.eps) - log_y), axis=1) | ||
else: | ||
ret = -xp.sum(t * log_y), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks there is a syntax error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I was careless.. will fix it soon.
Jenkins, test this please. |
Jenkins CI test (for commit f847214, target branch master) failed with status FAILURE. |
It looks the test is still failing. Could you check it? |
The CI test fails at chainer/tests/chainer_tests/functions_tests/loss_tests/test_softmax_cross_entropy.py Line 621 in f847214
'cross-entropy' is used to compute soft target loss.
The test above expects a loss value to be zero, but it becomes non-zero when What would you think on this? |
Sorry for being very late. |
Jenkins and flexCI, test this please. |
Jenkins CI test (for commit 620b55d, target branch master) succeeded! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you!!! |
Thank you for merging the PR ! |
This PR aims to support "soft target" in softmax_cross_entropy.
Current softmax_cross_entropy implementation support "hard target" but does not support "soft target" that is becoming popular as a method to mitigate over-fitting. This PR allows users to use "soft target" in softmax_cross_entropy as follows.
The soft target loss is KL divergence.